Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent fast ascii comparison if char is not letter #15774

Merged
merged 3 commits into from
Jan 16, 2025

Conversation

svennergr
Copy link
Contributor

What this PR does / why we need it:
The current containsLower implementation would only work for letters, as the c != s && c+'a'-'A' != s && c != s+'a'-'A' would never work comparing | against\, for example. That will be the case if simple case insensitive regex filters, like |= `(?i)\|` , that should only match against strings that contain |.

This added letter check makes the benchmark worse, but still better than the old version before #15076 was introduced. Maybe there are other ideas of improving?

                                                │   old.txt   │               new.txt                │
                                                │   sec/op    │    sec/op     vs base                │
ContainsLower/short_line_no_match-10              16.27n ± 0%    18.39n ± 1%   +13.06% (p=0.002 n=6)
ContainsLower/short_line_with_match-10            23.91n ± 1%    39.03n ± 2%   +63.22% (p=0.002 n=6)
ContainsLower/long_line_no_match-10               212.2n ± 0%    273.8n ± 4%   +29.03% (p=0.002 n=6)
ContainsLower/long_line_match_start-10            8.937n ± 0%   27.800n ± 1%  +211.07% (p=0.002 n=6)
ContainsLower/long_line_match_middle-10           98.20n ± 0%   124.45n ± 1%   +26.72% (p=0.002 n=6)
ContainsLower/long_line_match_end-10              231.8n ± 1%    305.3n ± 2%   +31.66% (p=0.002 n=6)
ContainsLower/short_unicode_line_no_match-10      47.92n ± 9%    57.74n ± 2%   +20.49% (p=0.002 n=6)
ContainsLower/short_unicode_line_with_match-10    35.47n ± 1%    53.71n ± 1%   +51.42% (p=0.002 n=6)
ContainsLower/long_unicode_line_no_match-10       231.3n ± 3%    274.0n ± 0%   +18.41% (p=0.002 n=6)
ContainsLower/long_unicode_line_match_start-10    233.8n ± 3%    240.4n ± 1%    +2.80% (p=0.037 n=6)
ContainsLower/long_unicode_line_match_middle-10   89.11n ± 1%   111.15n ± 1%   +24.74% (p=0.002 n=6)
ContainsLower/long_unicode_line_match_end-10      335.6n ± 1%    381.9n ± 2%   +13.78% (p=0.002 n=6)
geomean                                           77.39n         105.0n        +35.69%

@svennergr svennergr added the type/bug Somehing is not working as expected label Jan 15, 2025
@svennergr svennergr requested a review from cyriltovena January 15, 2025 16:17
@svennergr svennergr requested a review from a team as a code owner January 15, 2025 16:17
@@ -421,6 +421,9 @@ func (l equalFilter) String() string {
}

func newEqualFilter(match []byte, caseInsensitive bool) MatcherFilterer {
if caseInsensitive {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @MasslessParticle contains expects lowered values.

Not sure why we use contains for equality match btw.

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cyriltovena cyriltovena merged commit 9182add into main Jan 16, 2025
59 checks passed
@cyriltovena cyriltovena deleted the svennergr/fix-contains-lower-comparison branch January 16, 2025 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L type/bug Somehing is not working as expected
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants